42 research outputs found
Linear Global Translation Estimation with Feature Tracks
This paper derives a novel linear position constraint for cameras seeing a
common scene point, which leads to a direct linear method for global camera
translation estimation. Unlike previous solutions, this method deals with
collinear camera motion and weak image association at the same time. The final
linear formulation does not involve the coordinates of scene points, which
makes it efficient even for large scale data. We solve the linear equation
based on norm, which makes our system more robust to outliers in
essential matrices and feature correspondences. We experiment this method on
both sequentially captured images and unordered Internet images. The
experiments demonstrate its strength in robustness, accuracy, and efficiency.Comment: Changes: 1. Adopt BMVC2015 style; 2. Combine sections 3 and 5; 3.
Move "Evaluation on synthetic data" out to supplementary file; 4. Divide
subsection "Evaluation on general data" to subsections "Experiment on
sequential data" and "Experiment on unordered Internet data"; 5. Change Fig.
1 and Fig.8; 6. Move Fig. 6 and Fig. 7 to supplementary file; 7 Change some
symbols; 8. Correct some typo
Efficient 2D-3D Matching for Multi-Camera Visual Localization
Visual localization, i.e., determining the position and orientation of a
vehicle with respect to a map, is a key problem in autonomous driving. We
present a multicamera visual inertial localization algorithm for large scale
environments. To efficiently and effectively match features against a pre-built
global 3D map, we propose a prioritized feature matching scheme for
multi-camera systems. In contrast to existing works, designed for monocular
cameras, we (1) tailor the prioritization function to the multi-camera setup
and (2) run feature matching and pose estimation in parallel. This
significantly accelerates the matching and pose estimation stages and allows us
to dynamically adapt the matching efforts based on the surrounding environment.
In addition, we show how pose priors can be integrated into the localization
system to increase efficiency and robustness. Finally, we extend our algorithm
by fusing the absolute pose estimates with motion estimates from a multi-camera
visual inertial odometry pipeline (VIO). This results in a system that provides
reliable and drift-less pose estimation. Extensive experiments show that our
localization runs fast and robust under varying conditions, and that our
extended algorithm enables reliable real-time pose estimation.Comment: 7 pages, 5 figure
CP-SLAM: Collaborative Neural Point-based SLAM System
This paper presents a collaborative implicit neural simultaneous localization
and mapping (SLAM) system with RGB-D image sequences, which consists of
complete front-end and back-end modules including odometry, loop detection,
sub-map fusion, and global refinement. In order to enable all these modules in
a unified framework, we propose a novel neural point based 3D scene
representation in which each point maintains a learnable neural feature for
scene encoding and is associated with a certain keyframe. Moreover, a
distributed-to-centralized learning strategy is proposed for the collaborative
implicit SLAM to improve consistency and cooperation. A novel global
optimization framework is also proposed to improve the system accuracy like
traditional bundle adjustment. Experiments on various datasets demonstrate the
superiority of the proposed method in both camera tracking and mapping.Comment: Accepted at NeurIPS 202
4D Human Body Capture from Egocentric Video via 3D Scene Grounding
We introduce a novel task of reconstructing a time series of second-person 3D
human body meshes from monocular egocentric videos. The unique viewpoint and
rapid embodied camera motion of egocentric videos raise additional technical
barriers for human body capture. To address those challenges, we propose a
simple yet effective optimization-based approach that leverages 2D observations
of the entire video sequence and human-scene interaction constraint to estimate
second-person human poses, shapes, and global motion that are grounded on the
3D environment captured from the egocentric view. We conduct detailed ablation
studies to validate our design choice. Moreover, we compare our method with the
previous state-of-the-art method on human motion capture from monocular video,
and show that our method estimates more accurate human-body poses and shapes
under the challenging egocentric setting. In addition, we demonstrate that our
approach produces more realistic human-scene interaction
Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views
Hand-object interaction understanding and the barely addressed novel view
synthesis are highly desired in the immersive communication, whereas it is
challenging due to the high deformation of hand and heavy occlusions between
hand and object. In this paper, we propose a neural rendering and pose
estimation system for hand-object interaction from sparse views, which can also
enable 3D hand-object interaction editing. We share the inspiration from recent
scene understanding work that shows a scene specific model built beforehand can
significantly improve and unblock vision tasks especially when inputs are
sparse, and extend it to the dynamic hand-object interaction scenario and
propose to solve the problem in two stages. We first learn the shape and
appearance prior knowledge of hands and objects separately with the neural
representation at the offline stage. During the online stage, we design a
rendering-based joint model fitting framework to understand the dynamic
hand-object interaction with the pre-built hand and object models as well as
interaction priors, which thereby overcomes penetration and separation issues
between hand and object and also enables novel view synthesis. In order to get
stable contact during the hand-object interaction process in a sequence, we
propose a stable contact loss to make the contact region to be consistent.
Experiments demonstrate that our method outperforms the state-of-the-art
methods. Code and dataset are available in project webpage
https://iscas3dv.github.io/HO-NeRF